The CUHK Discourse TreeBank for Chinese: Annotating Explicit Discourse Connectives for the Chinese TreeBank
نویسندگان
چکیده
The lack of open discourse corpus for Chinese brings limitations for many natural language processing tasks. In this work, we present the first open discourse treebank for Chinese, namely, the Discourse Treebank for Chinese (DTBC). At the current stage, we annotated explicit intra-sentence discourse connectives, their corresponding arguments and senses for all 890 documents of the Chinese Treebank 5. We started by analysing the characteristics of discourse annotation for Chinese, adapted the annotation scheme of Penn Discourse Treebank 2 (PDTB2) to Chinese language while maintaining the compatibility as far as possible. We made adjustments to 3 essential aspects according to the previous study of Chinese linguistics. They are sense hierarchy, argument scope and semantics of arguments. Agreement study showed that our annotation scheme could achieve highly reliable results.
منابع مشابه
Annotating Discourse Connectives In The Chinese Treebank
In this paper we examine the issues that arise from the annotation of the discourse connectives for the Chinese Discourse Treebank Project. This project is based on the same principles as the PDTB, a project that annotates the English discourse connectives in the Penn Treebank. The paper begins by outlining range of discourse connectives under consideration in this project and examines the dist...
متن کاملBuilding Chinese Discourse Corpus with Connective-driven Dependency Tree Structure
In this paper, we propose a Connectivedriven Dependency Tree (CDT) scheme to represent the discourse rhetorical structure in Chinese language, with elementary discourse units as leaf nodes and connectives as non-leaf nodes, largely motivated by the Penn Discourse Treebank and the Rhetorical Structure Theory. In particular, connectives are employed to directly represent the hierarchy of the tree...
متن کاملCross-Lingual Identification of Ambiguous Discourse Connectives for Resource-Poor Language
The lack of annotated corpora brings limitations in research of discourse classification for many languages. In this paper, we present the first effort towards recognizing ambiguities of discourse connectives, which is fundamental to discourse classification for resource-poor language such as Chinese. A language independent framework is proposed utilizing bilingual dictionaries, Penn Discourse ...
متن کاملProposition Bank II: Delving Deeper
The PropBank project is creating a corpus of text annotated with information about basic semantic propositions. PropBank I (Kingsbury & Palmer, 2002) added a layer of predicateargument information, or semantic roles, to the syntactic structures of the English Penn Treebank. This paper presents an overview of the second phase of PropBank Annotation, PropBank II, which is being applied to English...
متن کاملCOLING 2012 24 th International Conference on Computational Linguistics
s of invited position papers Remarks on some not so closed issues concerning discourse connectives Aravind Joshi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Penn Discourse Treebank Relations and their Potential for Language Generation Kathleen McKeown . . . . . . . . . . . . . . . . . . . . . . . . ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014